117 research outputs found

    Layer-wise compressive training for convolutional neural networks

    Get PDF
    Convolutional Neural Networks (CNNs) are brain-inspired computational models designed to recognize patterns. Recent advances demonstrate that CNNs are able to achieve, and often exceed, human capabilities in many application domains. Made of several millions of parameters, even the simplest CNN shows large model size. This characteristic is a serious concern for the deployment on resource-constrained embedded-systems, where compression stages are needed to meet the stringent hardware constraints. In this paper, we introduce a novel accuracy-driven compressive training algorithm. It consists of a two-stage flow: first, layers are sorted by means of heuristic rules according to their significance; second, a modified stochastic gradient descent optimization is applied on less significant layers such that their representation is collapsed into a constrained subspace. Experimental results demonstrate that our approach achieves remarkable compression rates with low accuracy loss (<1%)

    Modeling of thermally induced skew variations in clock distribution network

    Get PDF
    Clock distribution network is sensitive to large thermal gradients on the die as the performance of both clock buffers and interconnects are affected by temperature. A robust clock network design relies on the accurate analysis of clock skew subject to temperature variations. In this work, we address the problem of thermally induced clock skew modeling in nanometer CMOS technologies. The complex thermal behavior of both buffers and interconnects are taken into account. In addition, our characterization of the temperature effect on buffers and interconnects provides valuable insight to designers about the potential impact of thermal variations on clock networks. The use of industrial standard data format in the interface allows our tool to be easily integrated into existing design flow

    Enabling DVFS Side-Channel Attacks for Neural Network Fingerprinting in Edge Inference Services

    Get PDF
    The Inference-as-a-Service (IaaS) delivery model provides users access to pre-trained deep neural networks while safeguarding network code and weights. However, IaaS is not immune to security threats, like side-channel attacks (SCAs), that exploit unintended information leakage from the physical characteristics of the target device. Exposure to such threats grows when IaaS is deployed on distributed computing nodes at the edge. This work identifies a potential vulnerability of low-power CPUs that facilitates stealing the deep neural network architecture without physical access to the hardware or interference with the execution flow. Our approach relies on a Dynamic Voltage and Frequency Scaling (DVFS) side-channel attack, which monitors the CPU frequency state during the inference stages. Specifically, we introduce a dedicated load-testing methodology that imprints distinguishable signatures of the network on the frequency traces. A machine learning classifier is then used to infer the victim architecture. Experimental results on two commercial ARM Cortex-A CPUs, the A72 and A57, demonstrate the attack can identify the target architecture from a pool of 12 convolutional neural networks with an average accuracy of 98.7% and 92.4

    TVFS: Topology Voltage Frequency Scaling for Reliable Embedded ConvNets

    Get PDF
    This brief introduces Topology Voltage Frequency Scaling (TVFS), a performance management technique for embedded Convolutional Neural Networks (ConvNets) deployed on low-power CPUs. Using TVFS, pre-trained ConvNets can be efficiently processed over a continuous stream of data, enabling reliable and predictable multi-inference tasks under latency constraints. Experimental results, collected from an image classification task built with MobileNet-v1 and ported into an ARM Cortex-A15 core, reveal TVFS holds fast and continuous inference (from few runs, up to 2000), ensuring a limited accuracy loss (from 0.9% to 3.1%), and better thermal profiles (average temperature 16.4 °C below the on-chip critical threshold)

    Graphene-PLA (GPLA): A compact and ultra-low power logic array architecture

    Get PDF
    The key characteristics of the next generation of ICs for wearable applications include high integration density, small area, low power consumption, high energy-efficiency, reliability and enhanced mechanical properties like stretchability and transparency. The proper mix of new materials and novel integration strategies is the enabling factor to achieve those design specifications. Moving toward this goal, we introduce a graphene-based regular logic-array structure for energy efficient digital computing. It consists of graphene p-n junctions arranged into a regular mesh. The obtained structure resembles that of Programmable Logic Arrays (PLAs), hence the name Graphene-PLAs (GPLAs); the high expressive power of graphene p-n junctions and their resistive nature enables the implementation of ultra-low power adiabatic logic circuits

    Dynamic ConvNets on Tiny Devices via Nested Sparsity

    Get PDF
    This work introduces a new training and compression pipeline to build nested sparse convolutional neural networks (ConvNets), a class of dynamic ConvNets suited for inference tasks deployed on resource-constrained devices at the edge of the Internet of Things. A nested sparse ConvNet consists of a single ConvNet architecture, containing N sparse subnetworks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at runtime, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weight subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 microcontroller unit (MCU), nested sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving 1) comparable accuracy; 2) remarkable storage savings; and 3) high performance. Moreover, when compared to state-of-the-art dynamic strategies, such as dynamic pruning and layer width scaling, nested sparse ConvNets turn out to be Pareto optimal in the accuracy versus latency space
    corecore